智能论文笔记

Feasibility Layer Aided Machine Learning Approach for Day-Ahead Operations

Arun Venkatesh Ramesh , Xingpeng Li

分类：机器学习

2022-08-13

预先完成的操作涉及一个复杂且计算密集的优化过程，以确定发电机的承诺时间表和调度。优化过程是一个混合企业线性程序（MILP），也称为安全受限的单位承诺（SCUC）。独立的系统操作员（ISO）每天运行SCUC，并需要最先进的算法来加快流程。可以利用历史信息中的现有模式来减少SCUC模型，这可以节省大量时间。在本文中，研究了基于机器学习（ML）的分类方法，即逻辑回归，神经网络，随机森林和K-Nearest邻居，以减少SCUC模型。然后，使用可行性层（FL）和后处理技术来帮助ML，以确保高质量的解决方案。提出的方法在多个测试系统上进行了验证，即IEEE 24总线系统，IEEE-73总线系统，IEEE 118总线系统，500个总线系统和波兰2383-BUS系统。此外，使用可再生生成的改良IEEE 24总线系统，证明了随机SCUC（SSCUC）的模型降低。仿真结果证明了高训练的准确性，以确定承诺时间表，而FL和后处理确保ML预测不会导致溶液质量损失最小的可行解决方案。

translated by 谷歌翻译

Machine Learning Assisted Approach for Security-Constrained Unit Commitment

Arun Venkatesh Ramesh , Xingpeng Li

分类：机器学习

2021-11-17

安全限制的单位承诺（SCUC）用于电力系统的日期前一代调度是一个混合整数的线性编程问题，该问题是计算密集的。良好的热启动解决方案或减少SCUC模型可以节省大量的时间。在这项工作中，提出了一种新的方法来有效地利用机器学习（ML）来提供良好的起始解决方案和/或降低SCUC的问题大小。使用历史节点需求配置文件和各自的承诺计划提出和培训使用逻辑回归算法的ML模型。处理并分析ML输出以辅助SCUC。拟议的方法是在几个标准测试系统上验证的，即IEEE 24-Bus系统，IEEE 73总线系统，IEEE 118总线系统，合成南卡罗来纳500公交系统，以及波兰2383总线系统。仿真结果表明，来自所提出的机器学习模型的预测可以提供良好的热启动解决方案和/或减少SCUC中的变量数量和限制，以及解决方案质量的最小损耗，同时大大减少计算时间。

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Generating physically-consistent high-resolution climate data with hard-constrained neural networks

Paula Harder , Qidong Yang , Venkatesh Ramesh , Prasanna Sattigeri , Alex Hernandez-Garcia , Campbell Watson , Daniela Szwarcman , David Rolnick

分类：机器学习

2022-08-08

可靠，高分辨率气候和天气数据的可用性对于为气候适应和缓解的长期决策提供了重要的意见，并指导对极端事件的快速响应。预测模型受到计算成本的限制，因此通常以粗空间分辨率预测数量。统计降尺度可以提供高采样低分辨率数据的有效方法。在这个领域，经常使用计算机视觉中超分辨率域中的方法成功地应用了深度学习。尽管经常取得令人信服的结果，但这种模型在预测物理变量时通常会违反保护法。为了节省重要的物理量，我们开发的方法可以通过深层缩减模型来确保物理约束，同时还根据传统指标提高其性能。我们介绍了约束网络的两种方法：添加到神经网络末尾的重新归一化层，并连续的方法随着增加的采样因子的增加而扩展。我们使用ERE5重新分析数据显示了我们在不同流行架构和更高采样因子上的方法的适用性。

translated by 谷歌翻译

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava , Abhinav Rastogi , Abhishek Rao , Abu Awal Md Shoeb , Abubakar Abid , Adam Fisch , Adam R. Brown , Adam Santoro , Aditya Gupta , Adrià Garriga-Alonso

分类：自然语言处理 | 人工智能 | 机器学习 | (统计)机器学习

2022-06-09

语言模型既展示了定量的改进，又展示了新的定性功能，随着规模的增加。尽管它们具有潜在的变革性影响，但这些新能力的特征却很差。为了为未来的研究提供信息，为破坏性的新模型能力做准备，并改善社会有害的效果，至关重要的是，我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战，我们介绍了超越模仿游戏基准（Big Bench）。 Big Bench目前由204个任务组成，由132家机构的442位作者贡献。任务主题是多样的，从语言学，儿童发展，数学，常识性推理，生物学，物理学，社会偏见，软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号，Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为，跨越了数百万到数十亿个参数。此外，一个人类专家评估者团队执行了所有任务，以提供强大的基准。研究结果包括：模型性能和校准都随规模改善，但绝对的术语（以及与评估者的性能相比）；在模型类中的性能非常相似，尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分，而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标；社交偏见通常会随着含糊不清的环境而随着规模而增加，但这可以通过提示来改善。

translated by 谷歌翻译

ClimART: A Benchmark Dataset for Emulating Atmospheric Radiative Transfer in Weather and Climate Models

Salva Rühling Cachay , Venkatesh Ramesh , Jason N. S. Cole , Howard Barker , David Rolnick

分类：机器学习 | (统计)机器学习

2021-11-29

地球天气和气候的数值模拟需要大量的计算。这导致替换替换具有在推理时间快速的近似机器学习（ml）方法的子程序来替换的子程序感兴趣。在天气和气候模型中，大气辐射转移（RT）计算特别昂贵。这使他们成为了基于神经网络的仿真器的流行目标。然而，由于缺乏缺乏全面的数据集和ML基准测试的标准化最佳实践，事先工作难以比较。为了填补这个差距，我们建立一个大型数据集，比加拿大地球系统模型为基础的大型数据集，高于\ emph {1000万个样本，未来的气候条件}。 Climart为ML社区带来了几种方法论挑战，例如多次分发试验集，底层域物理学和准确性和推广速度之间的权衡。我们还提出了几种新颖的基线，这些基线表示现有工作中使用的数据集和网络架构的缺点。下载说明，基准和代码可提供：https：//github.com/rolnicklab/climart

translated by 谷歌翻译

ReSQueing Parallel and Private Stochastic Convex Optimization

Yair Carmon , Arun Jambulapati , Yujia Jin , Yin Tat Lee , Daogao Liu , Aaron Sidford , Kevin Tian

分类：机器学习 | (统计)机器学习

2023-01-01

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.

translated by 谷歌翻译

Evaluating Generalizability of Deep Learning Models Using Indian-COVID-19 CT Dataset

Suba S , Nita Parekh , Ramesh Loganathan , Vikram Pudi , Chinnababu Sunkavalli

分类：计算机视觉

2022-12-28

Computer tomography (CT) have been routinely used for the diagnosis of lung diseases and recently, during the pandemic, for detecting the infectivity and severity of COVID-19 disease. One of the major concerns in using ma-chine learning (ML) approaches for automatic processing of CT scan images in clinical setting is that these methods are trained on limited and biased sub-sets of publicly available COVID-19 data. This has raised concerns regarding the generalizability of these models on external datasets, not seen by the model during training. To address some of these issues, in this work CT scan images from confirmed COVID-19 data obtained from one of the largest public repositories, COVIDx CT 2A were used for training and internal vali-dation of machine learning models. For the external validation we generated Indian-COVID-19 CT dataset, an open-source repository containing 3D CT volumes and 12096 chest CT images from 288 COVID-19 patients from In-dia. Comparative performance evaluation of four state-of-the-art machine learning models, viz., a lightweight convolutional neural network (CNN), and three other CNN based deep learning (DL) models such as VGG-16, ResNet-50 and Inception-v3 in classifying CT images into three classes, viz., normal, non-covid pneumonia, and COVID-19 is carried out on these two datasets. Our analysis showed that the performance of all the models is comparable on the hold-out COVIDx CT 2A test set with 90% - 99% accuracies (96% for CNN), while on the external Indian-COVID-19 CT dataset a drop in the performance is observed for all the models (8% - 19%). The traditional ma-chine learning model, CNN performed the best on the external dataset (accu-racy 88%) in comparison to the deep learning models, indicating that a light-weight CNN is better generalizable on unseen data. The data and code are made available at https://github.com/aleesuss/c19.

translated by 谷歌翻译

Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data

Harsh Rangwani , Sumukh K Aithal , Mayank Mishra , R. Venkatesh Babu

分类：机器学习 | 计算机视觉

2022-12-28

Real-world datasets exhibit imbalances of varying types and degrees. Several techniques based on re-weighting and margin adjustment of loss are often used to enhance the performance of neural networks, particularly on minority classes. In this work, we analyze the class-imbalanced learning problem by examining the loss landscape of neural networks trained with re-weighting and margin-based techniques. Specifically, we examine the spectral density of Hessian of class-wise loss, through which we observe that the network weights converge to a saddle point in the loss landscapes of minority classes. Following this observation, we also find that optimization methods designed to escape from saddle points can be effectively used to improve generalization on minority classes. We further theoretically and empirically demonstrate that Sharpness-Aware Minimization (SAM), a recent technique that encourages convergence to a flat minima, can be effectively used to escape saddle points for minority classes. Using SAM results in a 6.2\% increase in accuracy on the minority classes over the state-of-the-art Vector Scaling Loss, leading to an overall average increase of 4\% across imbalanced datasets. The code is available at: https://github.com/val-iisc/Saddle-LongTail.

translated by 谷歌翻译

Periocular Biometrics: A Modality for Unconstrained Scenarios

Fernando Alonso-Fernandez , Josef Bigun , Julian Fierrez , Naser Damer , Hugo Proença , Arun Ross

分类：计算机视觉

2022-12-28

Periocular refers to the region of the face that surrounds the eye socket. This is a feature-rich area that can be used by itself to determine the identity of an individual. It is especially useful when the iris or the face cannot be reliably acquired. This can be the case of unconstrained or uncooperative scenarios, where the face may appear partially occluded, or the subject-to-camera distance may be high. However, it has received revived attention during the pandemic due to masked faces, leaving the ocular region as the only visible facial area, even in controlled scenarios. This paper discusses the state-of-the-art of periocular biometrics, giving an overall framework of its most significant research aspects.

translated by 谷歌翻译